Skip to content

feat(skills): add synonym expansion to gws skills search#514

Open
dumko2001 wants to merge 15 commits intogoogleworkspace:mainfrom
dumko2001:feature/skills-search-synonyms
Open

feat(skills): add synonym expansion to gws skills search#514
dumko2001 wants to merge 15 commits intogoogleworkspace:mainfrom
dumko2001:feature/skills-search-synonyms

Conversation

@dumko2001
Copy link
Contributor

Description

Extends gws skills search (introduced in #507) with a static synonym expansion table so agents don't need to know exact API names to find the right skill.

Dependency: This PR builds on gws skills search from #507 and should be merged after that PR lands.

Problem: gws skills search email returned no results because "email" does not appear literally in the Gmail service's api_name ("gmail") or aliases. Agents using natural language to discover skills were forced to guess canonical names.

Solution: A static SYNONYMS table maps 30+ common terms to their canonical service names. Each query token is expanded before matching using token-AND / synonym-OR semantics — every token must match, but it is satisfied if any of its expansions appears in the skill's fields.

Search query Matches
email Gmail (emailgmail)
spreadsheet Sheets (spreadsheetsheets)
schedule Calendar (schedulecalendar)
document Docs (documentdocs)
presentation Slides (presentationslides)
send email gws-gmail-send helper
upload file gws-drive-upload helper

Dry Run Output:

$ gws skills search email
Searching for skills matching "email"...

[Service] gws-gmail - Send, read, and manage email
  Reference: skills/references/gws-gmail/SKILL.md

[Helper] gws-gmail-send - Send an email
  Reference: skills/references/gws-gmail-send/SKILL.md

[Helper] gws-gmail-triage - Show unread inbox summary (sender, subject, date)
  Reference: skills/references/gws-gmail-triage/SKILL.md

[Helper] gws-gmail-reply - Reply to a message (handles threading automatically)
  Reference: skills/references/gws-gmail-reply/SKILL.md

[Helper] gws-gmail-reply-all - Reply-all to a message (handles threading automatically)
  Reference: skills/references/gws-gmail-reply-all/SKILL.md

[Helper] gws-gmail-forward - Forward a message to new recipients
  Reference: skills/references/gws-gmail-forward/SKILL.md

[Helper] gws-gmail-watch - Watch for new emails and stream them as NDJSON
  Reference: skills/references/gws-gmail-watch/SKILL.md

[Recipe] recipe-label-and-archive-emails - Label and Archive Gmail Threads
  Description: Apply Gmail labels to matching messages and archive them to keep your inbox clean.
  Skill: skills/recipe-label-and-archive-emails/SKILL.md

Found 8 matching skills.

(This is a local discovery command — no JSON API request is produced, so no --dry-run JSON applies.)

Checklist:

  • My code follows the AGENTS.md guidelines (no generated google-* crates).
  • I have run cargo fmt --all to format the code perfectly.
  • I have run cargo clippy -- -D warnings and resolved all warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have provided a Changeset file (e.g. via pnpx changeset) to document my changes.

Add explicit --help/-h handling to `handle_skills_command` so that
`gws skills --help` and `gws skills search --help` print a proper
help screen instead of treating the flag as a search query.

Keeps the same manual dispatch pattern as all other top-level commands
in main.rs rather than introducing a full clap subcommand tree for a
single new command.
Per CLI convention, invoking a subcommand with no arguments should display
usage/help rather than return an error. Since 'gws skills --help', 'gws skills',
and 'gws skills <unknown-subcommand>' all indicate the user wants guidance,
route all non-search invocations to print_skills_help() and return Ok(()).
Replace single-string substring match with a token-AND approach: split the
query into individual tokens and require all tokens to appear somewhere in
the combined fields (name + description + aliases).

Previously, `gws skills search send email` built the query string "send email"
and required that exact contiguous phrase to appear in a field. This failed
for descriptions like "Gmail: Send an email" where the words are present but
not adjacent. With token-AND matching, both tokens must match anywhere across
the combined text, which is the correct behavior for natural-language queries.
Add a static SYNONYMS table mapping 30+ natural-language terms to
their canonical service names. When a query token matches a synonym
entry it is expanded to also include the canonical form, so:

  gws skills search email       → finds Gmail (email → gmail)
  gws skills search spreadsheet → finds Sheets (spreadsheet → sheets)
  gws skills search schedule    → finds Calendar (schedule → calendar)
  gws skills search document    → finds Docs (document → docs)

The matching logic uses token-AND with synonym-OR per token: every
original query token must be satisfied, but it is satisfied if any of
its synonym expansions appears in the skill's fields. An exact match
on the canonical name still works as before.

Adds expand_tokens() as a testable helper with 8 unit tests covering
no-synonym pass-through, multi-word expansion, deduplication, and
table integrity checks.
@dumko2001 dumko2001 requested a review from jpoehnelt as a code owner March 16, 2026 12:24
@googleworkspace-bot googleworkspace-bot added area: skills area: docs area: core Core CLI parsing, commands, error handling, utilities labels Mar 16, 2026
@changeset-bot
Copy link

changeset-bot bot commented Mar 16, 2026

🦋 Changeset detected

Latest commit: e2cec4d

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the discoverability of skills within the gws CLI by introducing natural language synonym expansion for the skills search command. It also reorganizes the underlying skill documentation into a more structured hierarchy, improving clarity and maintainability. These changes collectively make it easier for users to find and utilize the appropriate tools for their tasks.

Highlights

  • Synonym Expansion for Skill Search: Implemented a static synonym expansion table for gws skills search, allowing agents to use natural language terms (e.g., 'email' for Gmail, 'spreadsheet' for Sheets) to find relevant skills without needing exact API names. This uses 'token-AND / synonym-OR' semantics for matching.
  • Skill Documentation Restructuring: Restructured the skills/ directory into a hierarchical skills/references/ subdirectory for core API services and helpers. This change aims to prevent agent context pollution and improve organization.
  • Skill Generation and Link Validation Enhancements: Refactored the generate-skills logic to automate artifact generation, support filtering during generation, and update internal skill linking. A new Python script (check_links.py) was added to validate local markdown links within skill documentation.
  • New gws skills search Command: Introduced the gws skills search <query> command, providing a dedicated interface for discovering skills across services, helpers, personas, and recipes.
Changelog
  • .changeset/skill-optimization.md
    • Added a changeset file documenting the restructuring of skills into a hierarchical references/ subdirectory.
    • Added a changeset file documenting the new gws skills search <query> command for semantic/keyword discovery.
  • .changeset/skills-search-synonyms.md
    • Added a changeset file documenting the addition of synonym expansion to gws skills search.
Activity
  • The pull request introduces a new feature for skill search with synonym expansion.
  • It includes significant refactoring of skill generation and documentation structure.
  • A new Python script for link checking was added to ensure documentation integrity.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Generative AI Prohibited Use Policy, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a synonym expansion feature for the new gws skills search command, making it easier for users to discover skills with natural language queries. It also includes a significant refactoring that moves generated API and helper skills into a references/ subdirectory, improving the project's structure. The implementation is generally strong, with good test coverage for the new search logic.

I've identified one high-severity issue related to how search query arguments are parsed. Quoted queries are not correctly tokenized, which can lead to failed searches. I've provided a code suggestion to address this. Otherwise, the changes, including the large-scale refactoring and the addition of a link-checking script, are well-executed.

// Split into individual tokens so multi-word queries like "send email" match
// descriptions where the words appear separately (e.g. "Send an email").
// Then expand each token via the synonym table so "email" also matches "gmail".
let raw_tokens: Vec<String> = args[1..].iter().map(|a| a.to_lowercase()).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for parsing search tokens doesn't correctly handle quoted multi-word queries. For example, a command like gws skills search "send email" would incorrectly treat "send email" as a single token instead of two separate tokens, send and email. This leads to incorrect search behavior as it would search for the literal phrase rather than individual keywords.

To ensure both quoted and unquoted queries are handled correctly, you should first join all arguments into a single string and then split that string by whitespace.

Suggested change
let raw_tokens: Vec<String> = args[1..].iter().map(|a| a.to_lowercase()).collect();
let raw_tokens: Vec<String> = args[1..].join(" ").split_whitespace().map(|a| a.to_lowercase()).collect();

@dumko2001
Copy link
Contributor Author

Fixed the tokenization of quoted queries in gws skills search. Previously, searching for "send email" (quoted) would fail to match descriptions like "Send an email" because it looked for the exact phrase. Now, each argument is split by whitespace into individual tokens, ensuring consistent behavior regardless of quoting.

@googleworkspace-bot
Copy link
Collaborator

/gemini review

@dumko2001
Copy link
Contributor Author

Updated persona and recipe skill generation to use relative Markdown links in the PREREQUISITE section and correct skill identifiers (without the directory prefix) in the metadata requires.skills array. This resolves feedback regarding incorrect path references in generated skills.

@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a great enhancement to skill discovery! The synonym expansion will significantly improve the user experience for agents searching for skills. The implementation is solid, with good test coverage for the new logic. The refactoring to centralize registry definitions and improve error handling in the skill generation script is also a welcome improvement. I have one suggestion regarding performance in the new search command.

Comment on lines +170 to +197
for svc in services::SERVICES {
if let Some(helper) = helpers::get_helper(svc.api_name) {
let cli = Command::new(svc.api_name);
let doc = discovery::RestDescription {
name: svc.api_name.to_string(),
..Default::default()
};
let cli_with_helpers = helper.inject_commands(cli, &doc);
for sub in cli_with_helpers.get_subcommands() {
let name = sub.get_name();
if name.starts_with('+') {
let short_name = name.trim_start_matches('+');
let full_helper_name = format!("gws-{}-{}", svc.aliases[0], short_name);
let about = sub.get_about().map(|s| s.to_string()).unwrap_or_default();
let about_clean = about.strip_prefix("[Helper] ").unwrap_or(&about);

if matches(&[full_helper_name.as_str(), about_clean]) {
println!("[Helper] {} - {}", full_helper_name, about_clean);
println!(
" Reference: skills/references/{}/SKILL.md\n",
full_helper_name
);
results += 1;
}
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for searching helpers rebuilds the clap::Command tree for every service on every search invocation. This is inefficient and can slow down the skills search command, as it performs redundant computations on every run.

To improve performance, you can compute the list of all helpers once and cache it for subsequent searches. The once_cell::sync::Lazy pattern is a great fit for this. You would need to add once_cell as a dependency to your Cargo.toml.

Here's an example of how you could implement this optimization:

  1. Add use once_cell::sync::Lazy; and define a struct for helper information at the module level.
  2. Create a Lazy static to build the helper list once.
  3. Replace the current helper search loop with a simple iteration over this static HELPERS list.
use once_cell::sync::Lazy;
// ... other imports

struct HelperInfo {
    full_name: String,
    description: String,
}

static HELPERS: Lazy<Vec<HelperInfo>> = Lazy::new(|| {
    let mut helpers = Vec::new();
    for svc in services::SERVICES {
        if let Some(helper) = helpers::get_helper(svc.api_name) {
            let cli = Command::new(svc.api_name);
            let doc = crate::discovery::RestDescription {
                name: svc.api_name.to_string(),
                ..Default::default()
            };
            let cli_with_helpers = helper.inject_commands(cli, &doc);
            for sub in cli_with_helpers.get_subcommands() {
                if sub.get_name().starts_with('+') {
                    let short_name = sub.get_name().trim_start_matches('+');
                    let full_helper_name = format!("gws-{}-{}", svc.aliases[0], short_name);
                    let about = sub.get_about().unwrap_or_default();
                    let about_clean = about.strip_prefix("[Helper] ").unwrap_or(about);
                    helpers.push(HelperInfo {
                        full_name: full_helper_name,
                        description: about_clean.to_string(),
                    });
                }
            }
        }
    }
    helpers
});

// In handle_skills_command, replace the loop with:
// for h in HELPERS.iter() { ... }

This change will make the search command significantly faster, especially as the number of services and helpers grows.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable synonym expansion feature for skill search, significantly improving usability for agents. The accompanying refactoring of the skill file structure is substantial but well-executed, creating a more organized references directory. The implementation is generally strong, but I've identified a critical issue in the new link-checking script that would cause it to fail, and a high-severity bug in the search command where an empty query incorrectly matches all skills. My review includes suggestions to fix both issues.

if not target_path.exists():
print(f"Broken link in {filepath}: {link_path} (resolved to {target_path})")
broken_links += 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This indented empty line will cause a Python IndentationError and prevent the script from running. It should be removed.

raw_tokens.push(token.to_lowercase());
}
}
let query_display = raw_tokens.join(" ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

An empty or whitespace-only search query will currently match all skills. This is because token_candidates will be empty, and the .all() call on an empty iterator returns true. You should add a check to handle this case, for example by returning an error or showing help.

Suggested change
let query_display = raw_tokens.join(" ");
if raw_tokens.is_empty() {
return Err(GwsError::Validation(
"Search query cannot be empty or only whitespace.".to_string(),
));
}
let query_display = raw_tokens.join(" ");

@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable synonym expansion feature to the gws skills search command, making it much easier for users to discover skills using natural language terms. The implementation in skills_search.rs is well-structured.

The PR also includes a significant refactoring, moving the generated API and helper skills into a skills/references/ subdirectory. While this improves the project's structure by separating reference skills from composable ones like personas and recipes, it's a large change that is arguably outside the primary scope of adding synonym search. As per the repository's general rules, avoid introducing changes that are outside the primary goal of a pull request to prevent scope creep. In the future, it would be better to separate large refactorings from feature additions into their own pull requests to keep reviews focused and easier to manage.

I've found one high-severity issue where an empty search query incorrectly returns all skills. Please see my detailed comment.

Comment on lines +146 to +151
let matches = |fields: &[&str]| -> bool {
let combined = fields.join(" ").to_lowercase();
token_candidates
.iter()
.all(|candidates| candidates.iter().any(|c| combined.contains(c.as_str())))
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

An empty or whitespace-only search query (e.g., gws skills search "") incorrectly matches and displays all available skills. This happens because token_candidates becomes an empty vector, and the all() call on an empty iterator returns true, causing the matches closure to always return true.

To fix this, you can add a check at the beginning of the matches closure to handle the case of an empty query.

    let matches = |fields: &[&str]| -> bool {
        if token_candidates.is_empty() {
            return false;
        }
        let combined = fields.join(" ").to_lowercase();
        token_candidates
            .iter()
            .all(|candidates| candidates.iter().any(|c| combined.contains(c.as_str())))
    };

@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a synonym expansion feature to the gws skills search command, significantly improving its usability for natural language queries. It also includes a major refactoring of the skill generation logic, moving generated skill files into a skills/references subdirectory to separate them from user-facing persona and recipe skills. This is a great structural improvement. The PR also adds a new Python script to validate markdown links, which will help maintain documentation quality. The implementation is well-structured, with new logic for search and registries cleanly separated into new modules. I've found one high-severity issue where a whitespace-only query would incorrectly match all skills, and I've provided a suggestion to fix it.

Comment on lines +125 to +131
let mut raw_tokens = Vec::new();
for arg in &args[1..] {
for token in arg.split_whitespace() {
raw_tokens.push(token.to_lowercase());
}
}
let query_display = raw_tokens.join(" ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

A search query consisting only of whitespace characters will result in raw_tokens being empty. This causes the search logic to match all skills because token_candidates.iter().all() on an empty iterator returns true.

To fix this, you should add a check for empty raw_tokens after parsing the arguments.

I've also suggested a small refactoring to use iterators for a more concise way to build raw_tokens.

Suggested change
let mut raw_tokens = Vec::new();
for arg in &args[1..] {
for token in arg.split_whitespace() {
raw_tokens.push(token.to_lowercase());
}
}
let query_display = raw_tokens.join(" ");
let raw_tokens: Vec<String> = args[1..]
.iter()
.flat_map(|arg| arg.split_whitespace())
.map(|s| s.to_lowercase())
.collect();
if raw_tokens.is_empty() {
return Err(GwsError::Validation(
"Search query cannot be empty. Usage: gws skills search <query>".to_string(),
));
}
let query_display = raw_tokens.join(" ");

@dumko2001
Copy link
Contributor Author

Fixed a high-severity bug where whitespace-only or empty search queries would incorrectly match all skills. Now, an error is returned if no valid search tokens are identified.

@googleworkspace-bot
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable gws skills search command with synonym expansion, significantly improving skill discoverability. The accompanying refactoring to a hierarchical skills/references/ directory structure is a good organizational improvement. The new link-checking script is also a welcome addition for repository maintenance. My review identifies one high-severity security issue related to sanitizing user input in the new search command to prevent terminal escape sequence injection.

raw_tokens.push(token.to_lowercase());
}
}
let query_display = raw_tokens.join(" ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The search query is constructed from user input and later printed to the terminal on line 139 without sanitization. This could allow for terminal escape sequence injection if a user provides a malicious query, for example by including ANSI escape codes in the search terms. It's important to sanitize any user-provided string before printing it to the terminal. This aligns with the general rule to 'Sanitize error strings printed to the terminal to prevent escape sequence injection.' While this is not an error string, the principle of sanitizing all user-controlled output applies.

Suggested change
let query_display = raw_tokens.join(" ");
let query_display: String = raw_tokens.join(" ").chars().filter(|c| !c.is_control()).collect();
References
  1. Sanitize error strings printed to the terminal to prevent escape sequence injection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: core Core CLI parsing, commands, error handling, utilities area: docs area: skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants